Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Feature string-based intelligent information retrieval from Tamil document images

Identifieur interne : 000A82 ( Main/Exploration ); précédent : 000A81; suivant : 000A83

Feature string-based intelligent information retrieval from Tamil document images

Auteurs : S. Abirami [Inde] ; D. Manjula [Inde]

Source :

RBID : Pascal:10-0181228

Descripteurs français

English descriptors

Abstract

Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Feature string-based intelligent information retrieval from Tamil document images</title>
<author>
<name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0181228</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0181228 INIST</idno>
<idno type="RBID">Pascal:10-0181228</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000192</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000585</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000198</idno>
<idno type="wicri:doubleKey">0952-8091:2009:Abirami S:feature:string:based</idno>
<idno type="wicri:Area/Main/Merge">000A92</idno>
<idno type="wicri:Area/Main/Curation">000A82</idno>
<idno type="wicri:Area/Main/Exploration">000A82</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Feature string-based intelligent information retrieval from Tamil document images</title>
<author>
<name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Character string</term>
<term>Document retrieval</term>
<term>Electronic library</term>
<term>Extraction process</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Information retrieval</term>
<term>Keyword</term>
<term>Letter</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Optical image</term>
<term>Pattern extraction</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Chaîne caractère</term>
<term>Recherche information</term>
<term>Recherche documentaire</term>
<term>Langage naturel</term>
<term>Texte</term>
<term>Reconnaissance image</term>
<term>Image optique</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Analyse image</term>
<term>Traitement image</term>
<term>Bibliothèque électronique</term>
<term>Lettre alphabet</term>
<term>Procédé extraction</term>
<term>Mot clé</term>
<term>Extraction forme</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Inde</li>
</country>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
</noRegion>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A82 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A82 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0181228
   |texte=   Feature string-based intelligent information retrieval from Tamil document images
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024